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Abstract 

Replication of eukaryotic positive-stranded RNA viruses is usually linked to the 
presence of membrane-associated replicative organelles. The purpose of this review is 
to discuss the function of proteins responsible for formation of the coronavirus 
replicative organelle. This will be done by identifying domains that are conserved 
across the order Nidovirales, and by summarizing what is known about function and 
structure at the level of protein domains. 
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Introduction 

The order Nidovirales includes several families of large RNA viruses, arranged from 
longest to shortest genome as the Coronaviridae, Roniviridae, Mesoniviridae and 
Arteriviridae. The Coronaviridae currently contains the Torovirinae and Coronavirinae 
lineages, though analysis of recently reported divergent toro-like viruses suggests that 
the Torovirinae may be better represented as an independent family in the Nidovirales 
(Stenglein et al., 2014). The Coronavirinae currently contains four genera - Alpha- and 
Betacoronavirus that infect mammals, and Gamma- and Deltacoronavirus that infect 
birds and mammals. 

Members of the Nidovirales infect metazoan hosts, have several replicative genes in 
common (Lauber et al., 2013), express their structural genes via subgenomic mRNAs 
which usually join sequences from both genomic termini (Sawicki et al., 2007), express 
their replicases as polyproteins via a ribosomal frameshift, and replicate in association 
with viral transmembrane proteins on intracellular paired membranes (reviewed in 
(V'Kovski et al., 2015)). Coronavirus growth is accompanied by a variety of intracellular 
membrane rearrangements, as illustrated for the coronavirus Mouse hepatitis virus 
(MHV) in Fig. 1. Regions of these paired-membrane structures have been given a 
variety of names in the literature, including double-membrane vesicles (DMVs), 
convoluted membranes, spherules, zippered endoplasmic reticulum, but it is not clear 
whether the different parts of the organelle have different functions. Furthermore, 
recent studies (Al-Mulla et al., 2014; Maier et al., 2016) suggest that there may be 
considerable plasticity and overlap among coronavirus paired membrane replicative 
structures. For this reason, it seems preferable to break with past practice of focussing 
on double-membrane vesicles in particular, to consider the double-membrane organelle 
(DMO) as a whole. The term DMO encompasses DMVs as well as all of the other 
associated paired-membrane structures like convoluted membranes and spherules that 
make up the viral replicative organelle. The term DMO will be used throughout this 
review except when specifically referring to the DMV component of the replicative 
organelle. This review summarizes what we know about coronavirus DMOs, and what 
bioinformatics can tell us about DMO-making proteins. 
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Origin of DMO Membranes 

Membrane-bound replicative organelles are a widespread but not universal intracellular 
feature associated with positive-strand RNA virus replication (Neuman et al., 2014a). 
Coronaviruses form a double-membrane organelle (DMO) derived from the 
endoplasmic reticulum (ER) that contains viral RNA and replicase proteins (Deming et 
al., 2007; Hagemeijer et al., 2010; Oostra et al., 2007; Reggiori et al., 2010; Shi et al., 
1999; van der Meer et al., 1999). In the DMO, two lipid bilayers are held at a constant 
distance of about 20 nm (Angelini et al., 2013). Electron microscopy and tomography 
studies have revealed that the replicative organelles of coronaviruses and the related 
arteriviruses are drawn from a repertoire of paired-membranes, including open-ended 
spherules, closed double-membrane vesicles, and both planar and convoluted paired 
membranes (Knoops et al., 2012; Knoops et al., 2008; Maier et al., 2013). 

Nonstructural proteins nsp3, nsp4 and nsp6 are required to form structures similar to 
the DMOs observed in SARS coronavirus (SARS-CoV) infected cells (Angelini et al., 
2013). And since protease domains of nsp5 are required to release nsp4 and nsp6 
from the polyprotein precursor, the region from nsp3 to nsp6 can collectively be thought 
of as the DMO-forming apparatus of coronaviruses. Phylogenetic analysis and 
comparison of domain architecture can be taken as evidence of homology of the 
coronavirus DMO-making proteins across the Nidovirales (Fig. 2). 

The formation of paired membranes probably involves interactions on both sides of the 
membrane, and perhaps within the membrane, and a few of these interactions have 
been confirmed biochemically. SARS-CoV nsp3-nsp3 interactions have been detected 
in cells by yeast two-hybridization (Pan et al., 2008) and GST pulldown (Imbert et al., 
2008), and in purified protein by perfluorooctanoic acid polyacrylamide gel 
electrophoresis (Neuman et al., 2008). While SARS-CoV nsp4-nsp4 interactions were 
not found in yeast-two hybrid or mammalian two-hybrid screens (Pan et al., 2008; von 
Brunn et al., 2007) studies with MHV did detect nsp4-nsp4 interactions by Venus 
reporter fluorescence (Hagemeijer et al., 2011). To date, homotypic interactions have 
not been demonstrated for nsp6 despite several attempts (Imbert et al., 2008; Pan et 
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al., 2008; von Brunn et al., 2007). Heterotypic interactions between coronavirus nsp3 
and nsp4 have been observed by mammalian two-hybridization (Pan et al., 2008) and 
Venus reporter fluorescence (Hagemeijer et al., 2011), although because of differences 
in the parts of nsp3 that were used, these may actually represent two distinct modes of 
interaction between nsp3 and nsp4. Nsp4-nsp6 interaction has been demonstrated by 
Venus reporter fluorescence (Hagemeijer et al., 2011) and indirectly because co¬ 
expression of nsp4 abrogates vesicle accumulation due to expression of nsp6 (Angelini 
et al., 2013). Each of these proteins also potentially interacts with host proteins, which 
may have a downstream effect on pathogenesis (Pfefferle et al., 2011). 

Coexpression of nsp3 and nsp4 produces large areas of paired membranes, apparently 
arranged in parallel tubes (Angelini et al., 2013; Hagemeijer et al., 2014). In terms of 
topology, nsp3-4 membrane pairing involves linking opposite sides of the ER across the 
ER lumen, so it is likely that the luminal domains of both proteins are involved in this 
interaction. Membrane pairing would not necessarily need to involve high-affinity 
interactions, as demonstrated by a study where paired membranes were induced by 
low-affinity interactions between membrane-linked green fluorescent protein (Snapp et 
al., 2003). The minimal requirements for DMO-like membrane pairing appear to be the 
C-terminal region of nsp3 that contains both transmembrane regions and the luminal 
ectodomain (Hagemeijer et al., 2014), and at least the N-terminal region of nsp4 
including the first three transmembrane regions of nsp4 (Sparks et al., 2007). Mutations 
in the glycosylated luminal domain of nsp4 result in either non-recoverable virus or less 
consistent membrane pairing (Gadlage et al., 2010). While the final transmembrane C- 
terminal cytosolic domain of nsp4 is dispensible for coronavirus replication (Sparks et 
al., 2007), it is not clear whether partial deletion of nsp4 affects DMO structure. These 
observations together suggest that coronavirus membranes most likely pair through 
heterotypic interactions involving the luminal domains of nsp3 and nsp4, though 
interactions between cytosolic domains that lead to nsp3 and nsp4 clustering may also 
be important for membrane pairing. 
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The ER is the likely source of coronavirus DMO membranes, which may be obtained by 
co-opting the ER-associated degradation (ERAD) tuning pathway, a cellular degradation 
pathway that is responsible for the turnover of misfolded proteins in the ER (Reggiori et 
al., 2010). The ERAD tuning pathway is modulated by stress-inducible positive 
regulators of protein disposal such as EDEM1 (ER degradation-enhancing alpha 
mannosidase-like 1) and OS-9 (osteosarcoma amplified 9), which assist in transporting 
misfolded proteins into the cytosol for proteasomal degradation. Under physiological 
conditions, low concentrations of EDEM1 and OS-9 are maintained in the ER lumen in 
order to avoid premature degradation of proteins that are in the process of folding (Cali 
et al., 2008). In this case, EDEM1 and OS-9 are selectively confined by interacting with 
the transmembrane-anchored cargo receptor SEL1L (suppressor of lin-12-like) and later 
released from the ER lumen in small short-lived vesicles, called EDEMosomes, which 
rapidly fuse with the endolysosomal compartments (Bernasconi et al., 2012). In infected 
cells, viral double-stranded RNA colocalizes with EDEM1, OS-9, SEL1L and LC3-I, 
which is recruited to autophagosomes. Moreover, replication of MHV, which does not 
require an intact autophagy pathway, is impaired upon knockdown of LC3 or SEL1L 
(Bernasconi et al., 2012). Taken together, this is evidence that MHV exploits the ERAD- 
tuning machinery to establish DMOs for replication. A summary of nsp3-6 interactions 
and induced membrane rearrangements is shown in Fig. 3. 

Nsp3 

Coronavirus nsp3 is a large multidomain protein ranging from around 1450 amino acid 
residues in Deltacoronavirus (Woo et al., 2012) to nearly 2100 amino acid residues in 
the unpublished Hipposideros pratti betacoronavirus- Zhejiang2013 (GenBank accession 
NC_025217). Most nsp3s are predicted to be ~200 kDa, and are cleaved from the 
polyprotein la or lab papain-like proteases (PL pro ) that are encoded within nsp3. 

To make it easier to discuss specific parts of such a large protein in terms of both 
structure and function, we previously published a domain-level annotation of nsp3 
(Neuman et al., 2008), which we have updated for this review (Fig. 4). 
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Based on phylogenetic analysis of nidovirus nsp3 homologues, results from previously 
published studies (Gorbalenya et al., 2006; Ratia et al., 2006; Saikatendu et al., 2005; 
Serrano et al., 2007b; Thiel et al., 2003; Ziebuhr et al., 2001) and de novo domain 
prediction software (Jaroszewski et al., 2005), we estimate that the full repertoire of 
sequenced coronavirus nsp3 genes encodes 18 domains, with individual viruses having 
10-16 domains each. Several of these domains are duplicated, including two ubiquitin- 
like domains Ubl and Ub2, two PL pro , and three macrodomains (Macl, Mac2 and 
Mac3). Ten of these domains form the core of coronavirus nsp3, and are found in every 
currently known coronavirus, including both ubiquitin-like domains, the second PL pro , the 
first macrodomain Macl, a hypervariable region consisting of mostly acidic residues, 
and a region including the transmembrane regions TM1 and TM2, nsp3 ectodomain 
(3Ecto), a nidovirus-conserved domain of unknown function (Y) and a region predicted 
to contain two structural domains which are only found in coronaviruses (CoV-Y). Six of 
the ten domains conserved in all coronaviruses are also found in other members of the 
Nidovirales, with evidence that the region from TM1 to Y is present in all nidoviruses 
except the two families that infect arthropods, namely the Roniviridae and Mesoniviridae 
(V'Kovski et al., 2015). 

The ectodomain of nsp3, 3Ecto, is glycosylated in SARS-CoV at positions 1431 and 
1434 (Harcourt et al., 2004) and the corresponding region of MHV (Kanjanahaluethai et 
al., 2007), and is predicted to be located on the luminal side of the membrane. Each 
copy of nsp3 is predicted span the membrane twice, placing the first 1395 residues of 
SARS-CoV nsp3 and the last 377 residues (Y and CoV-Y) on the cytosolic face of the 
membrane. Notably, the regions immediately before TM1 and after TM2, which would 
both have a cytosolic membrane topology are highly hydrophobic, and may serve to link 
the pre- and post-transmembrane regions of nsp3. 

Nsp3 (Ubl to PL 1 pr0 ) 

The N-terminal domain of all coronavirus nsp3 proteins containing Ubl, a hypervariable 
region (HVR) and PL1 pro appears poorly conserved at first glance, showing less than 
20% average amino acid identity between members of different coronavirus genera 
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(Fig. 5), but secondary structure prediction (JPred; (Drozdetskiy et al., 2015)) suggests 
that the Ubl and PL1 pro domains adopt a conserved fold in all coronaviruses (data not 
shown). The NMR structure of the residues 1-112 of SARS-CoV nsp3 exhibits a 
globular ubiquitin-like fold with two additional helices which make the overall structure of 
the Ubl domain somewhat more elongated than other ubiquitin-like proteins (Serrano et 
al., 2007b). In contrast, the following HVR was shown to be structurally disordered for 
SARS-CoV (Serrano et al., 2007b) and is dispensable for replication in MHV (Hurst et 
al., 2013). 

While the function of Ubl has only been investigated in MHV, this domain has been 
found to play an essential role in initiating viral RNA synthesis, where it interacts with 
the viral nucleoprotein (N; (Hurst et al., 2013; Hurst-Hess et al., 2015)). This was 
demonstrated in experiments that attempted to delete nsp3 domains, or substitute with 
the corresponding domains from other coronaviruses. The interaction of Ubl with N 
could effectively tether nsp3 to viral RNA during the replication process, but further 
experimentation is needed to better understand how the nsp3-N interaction leads to 
more efficient viral RNA synthesis. 

Additionally, Ubl has an extra alpha helix and a 3io helix that is unusual for ubiquitin 
folds in general (Serrano et al., 2007a). Among the closest structural matches to 
SARS-CoV Ubl is one of the ubiquitin-like domains of ISG15, an interferon-induced 
protein constitutively present in higher eukaryotes. This has led to speculation that Ubl 
may be involved in modulating the effects of intracellular immunity in a manner 
analogous to the immunomodulatory decoys of poxviruses (Johnston and McFadden, 
2003). Some viruses have developed a mechanism to avoid the expression of ISG15. 
For example, Influenza B virus blocks its expression by means of NS1 protein in order 
to overcome the immune response. The PL2 pro domain of SARS-CoV nsp3 also 
recognizes and cleaves ISG15 (Lindner et al., 2007), which could potentially modulate 
the intracellular response to infection (Morales et al., 2015). However, the comparison 
with ISG15 remains speculative because an immunomodulatory function for Ubl has 
not yet been demonstrated experimentally. It is known that ISG15 is able to inhibit virus 
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replication by abrogating nuclear processing of unspliced viral RNA precursors. 
However, some viruses have developed a mechanism to avoid the expression of 
ISG15. For example, influenza B virus blocks its expression by means of NS1 protein in 
order to overcome the immune response. It is possible that the PL2 pro domain of nsp3 
may also bind ISG15 and modulate the intracellular response to infection (Morales et 
al., 2015). However, the comparison with ISG15 remains speculative because an 
immunomodulatory function for Ubl has not yet been demonstrated experimentally. 

When SARS-CoV Ubl was expressed in E. coli, it was found to bind tightly to a small 
RNA fragment that mass spectrometry analysis revealed to be consistent with the 
sequence GAUA or GUAA (Serrano et al., 2007a). While matching sequences can be 
found throughout the SARS-CoV, none are prominently located in regions of the 5’-UTR 
or 3’-UTR that are known to contain sequences essential for recognition of viral RNA by 
components of the replicase. The functional significance of RNA-binding by Ubl 
therefore remains unknown, but may complement the Ubl-N interaction. 

A papain-like protease domain (PL1 pro ) follows the HVR domain in some coronaviruses, 
but is absent in SARS-CoV and Middle east respiratory syndrome coronavirus (MERS- 
CoV). Where present, PL1 pro generally cleaves at the N-terminal boundary of nsp3, but 
in viruses that have only one PL pro , this cleavage is carried out by PL2 pro (Hilgenfeld, 
2014). A transcription factor-like zinc finger is conserved in all complete coronavirus 
PL pro domains (Culver et al., 1993), which was taken as an early indication that nsp3 
might be involved in coronavirus RNA synthesis. This hypothesis is supported by a 
report in which the Equine arteritis virus nonstructural protein 1, which is structurally and 
enzymatically similar to coronavirus PL1 pro (Sun et al., 2009), was shown to be 
indispensable for viral subgenomic mRNA synthesis (Phizicky and Greer, 1993). 

Some strains of Infectious bronchitis virus and Hipposideros pratti betacoronavirus- 
Zhejiang2013 contain only partial papain-like protease domains that are lacking one of 
the two catalytic domains (Fig. 2). It is not clear whether these domains are able to 
interact with the missing domains from PL2 pro and remain functional, or whether these 
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are inactive relict domains derived from an ancestral genome that are no longer 
necessary for viral growth and are in the process of being deleted. 

Nsp3 (Macl to DPUP) 

The next conserved domain is the first of up to three domains that adopt macrodomain 
folds similar to the histone H2A in SARS-CoV (Saikatendu et al., 2005). This domain 
was originally called an X-domain (Gorbalenya et al., 1991), an alternate name that is 
still used in some publications. The structure of Macl has been solved for several 
coronaviruses (Piotrowski et al., 2009; Saikatendu et al., 2005; Tan et al., 2009; 
Wojdyla et al., 2009; Xu et al., 2009). Purified SARS-CoV Macl was shown to have 
relatively weak ADP ribose 1” phosphatase (ADRP) activity (Saikatendu et al., 2005). 
Enzymatic activity of Macl was predicted to be due to His45 based on comparison with 
the homologous protein from yeast Ymx7, Archaeoglobus fulgidus AF1521 and Er58 
from E. coli. The active site was verified by site directed mutagenesis data in the Human 
coronavirus 229E (HCoV-229E) Macl, which suggested that the corresponding 
residues Asn37, Asn40, His45, Gly44 and Gly48 of SARS-CoV Macl are part of the 
active site (Putics et al., 2005). Both the SARS-CoV ADRP and the HCoV-229E 
counterpart dephosphorylate ADP ribose 1" phosphate to ADP-ribose in a highly 
specific manner, the enzyme having no detectable activity on several other nucleoside 
phosphates (Putics et al., 2005). Characterization of an ADP ribose 1’’phosphatase- 
deficient HCoV-229E mutant revealed no significant effects on viral RNA synthesis and 
virus titer (Putics et al., 2005), but mutation of the Macl active site altered MHV 
pathogenicity, suggesting that this domain may play an immunomodulatory role by 
interacting with unknown host factors (Kuri et al., 2011). 

The existence of an ADRP-like domain in all CoV nsp3s (as well as in several other 
RNA viruses) suggests that ADP ribose phosphatase activity confers some sort of 
advantage to coronaviruses. Egloff et al. suggested that Macl may primarily be a poly- 
ADP-ribose binding (PAR-binding) module, rather than an ADP-ribose cleaving enzyme 
(Egloff et al., 2006). PARylation occurs when PAR polymerases are activated, often in 
compromised cells, to trigger apoptosis or DNA repair. Perhaps more relevant to 
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coronavirus infection, the formation of cytoplasmic stress granules, which are 
aggregates of cellular stalled translation complexes, appears to be regulated by 
PARylation (Leung, 2014). Further research is needed to determine whether 
coronavirus ADRP expression affects RNA granule formation or stability. However, the 
Infectious bronchitis virus Macl does not have detectable ADP-ribose binding activity 
suggesting that Macl domains may have other important functions that are unrelated to 
ADP-ribose (Piotrowski et al., 2009). 

In SARS-CoV Macl is followed by two more macrodomains (Chatterjee et al., 2009; 
Tan et al., 2009) that form part of the region that was originally called the SARS-CoV 
Unique Domain (SUD). However, in light of more recent structural evidence that the 
SUD contains not one but three distinct structural domains, and phylogenetic evidence 
that some SUD domains are not unique to SARS-CoV, a revision to the name of this 
region is necessary (Chen et al., 2015). This region was originally divided into N- 
terminal and middle domains, which appear in the literature as SUD-N, SUD-M, but are 
here renamed as Mac2 and Mac3 to reflect their conservation outside SARS-CoV, 
followed by a small C-terminal domain known as DPUP for its position as the Domain 
Preceding Ub2 and PL2 pro . The structure of both Mac2 and Mac3 is a close structural 
match for the SARS-CoV Macl domain despite a lack of detectable amino acid 
homology between these proteins (Tan et al., 2009). The presence of these additional 
macrodomain folds has also been confirmed by the NMR structure of the complete 
Mac2-Mac3-DPUP (Johnson et al., 2010) and the NMR structure of SUD-M (Chatterjee 
et al., 2009). The SARS-CoV DPUP domain contains a novel fold that consists of an 
antiparallel beta sheet (Johnson et al., 2010) that was surprisingly also found in the 
corresponding region of MHV despite negligible evidence of homology based on 
alignment of amino acid sequences (Chen et al., 2015). 

All three of the domains that make up the SUD have been demonstrated to interact with 
nucleic acid in some way. Expressed Mac2-Mac3 has a high affinity for G-rich 
sequences and G-quadruplexes (Tan et al., 2009), while the Mac3-DPUP showed a 
general preference for purine nucleotides (Johnson et al., 2010). Notably, while Mac2 
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and Mac3 domains bear a close structural resemblance to the SARS-CoV Macl ADRP 
domain, neither domain has any demonstrable affinity for ADP-ribose (Tan et al., 2009). 
The amino acid residues responsible for Mac3 and DPUP RNA binding have been 
mapped, and appear to fall near the region of Mac3 that corresponds to the active site 
in the structurally similar Macl domain (Chatterjee et al., 2009). Together this suggests 
that the cluster of three macrodomains in SARS-CoV nsp3 arose through gene 
duplication and that additional macrodomains may contribute to the function of nsp3 as 
an accessory to the viral replication process (Neuman et al., 2008). Interestingly, in the 
context of a SARS-CoV infectious cDNA clone, Macl and Mac2 were dispensable but 
Mac3 was essential for RNA replication (Kusov et al., 2015). 

Nsp3 (UB2 and PL2 pro ) 

Unlike many coronavirusesthat encode two papain-like proteases, SARS-CoV has a 
single copy of papain-like cysteine protease (PL2 pro ) that cleaves polyprotein la at three 
sites at the N-terminus to release nspl, nsp2, and nsp3, respectively (Harcourt et al., 
2004; Thiel et al., 2003). However, another important role for SARS-CoV PL2 pro may be 
linked to its deubiquitinating activity; it efficiently disassembles diubiquitin and branched 
polyubiquitin chains, cleaves 7-amino-4-methylcoumarin-conjugated ubiquitin 
substrates, and has de-ISGylating activity (Chen et al., 2007; Lindner et al., 2005). 

Thus, PL2 pro may have critical roles not only in proteolytic processing of the replicase 
complex but also in subverting cellular ubiquitination machinery to facilitate viral 
replication (Bailey-Elkin et al., 2014; Mielech et al., 2015; Mielech et al., 2014), as 
demonstrated for the arterivirus Equine arteritis virus (van Kasteren et al., 2013). 

PL2 pro is preceded by a second ubiquitin-like domain (Ratia et al., 2006). The protease 
catalytic domain adopts the canonical “thumb, palm and fingers” domain architecture. 
Two beta-hairpins at the fingertips region contain four cysteine residues, which 
coordinate a zinc ion. Mutational analysis of the zinc-coordinating cysteines of SARS- 
CoV PL pro showed that zinc-binding ability is essential for structural integrity and 
protease activity (Barretto et al., 2005). PL2 pro has several structural homologues from 
the cysteine protease superfamily that are cellular deubiquitinating enzymes. The active 
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site of PL pro consists of a catalytic triad of cysteine, histidine, and aspartic acid residues, 
consistent with catalytic triads found in many PL pro domains. Comparison of the SARS- 
CoV PL2 pro structure with the structure of TGEV PL1 pro demonstrates that the 
coronavirus-like PL pro folds have a common architecture (Wojdyla et al., 2010), and 
likely arose through gene duplication. 

It has been demonstrated that an L/l X G G motif at the (pre-cleavage) P4-P1 positions 
of the substrate is essential for recognition and cleavage by Betacoronavirus PL2 pro 
(Barretto et al., 2005; Han et al., 2005). There appear to be no preferences for the post¬ 
cleavage positions or for residues before P4. It is not surprising then that SARS-CoV 
PL pro is able to cleave after the four C-terminal residues of ubiquitin, LRGG. As 
predicted (Sulea et al., 2005), SARS-CoV PL2 pro possesses de-ubiquitinating activity 
(Barretto et al., 2006; Lindner et al., 2005) in addition to cysteine protease activity 
involved in viral polyprotein processing. The specific deubiquitinating enzyme inhibitor, 
ubiquitin aldehyde, inhibited its activity at a Ki of 210 nM (Lindner et al., 2005). 

Interestingly, a number of cellular deubiquitinases, including full-length USP14 and 
Ubp6, possess an N-terminal ubiquitin-like domain. Although the significance of this 
domain in these proteins is not well established, it has been demonstrated that the 
presence of the ubiquitin-like domain in USP14 and Ubp6 serves a regulatory function 
by mediating interactions between these deubiquitinases and specific components of 
the proteasome (Hu et al., 2005; Leggett et al., 2002). Comparisons of deubiquitinase 
activities between wild-type and mutant Ubp6 lacking the ubiquitin-like domain reveal 
that these associations are responsible for a 300-fold increase in catalytic rate and serve 
to activate the enzyme (Leggett et al., 2002). It is intriguing to consider whether the 
ubiquitin-like domain of PL pro may have a similar function. 

While the role of PL2 pro in polyprotein processing is well understood, the physiological 
significance of its deubiquitinating activity in the viral replication cycle is still not 
completely clear. However the conserved structural protein E is readily ubiquitinated in 
infected cells, suggesting that control of ubiquitination may be important for SARS-CoV 
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assembly (Alvarez et al., 2010). Mounting evidence suggests that PL2 pro interferes with 
interferon transcriptional activation pathways by inactivating TBK1, blocking NF-kappaB 
signaling and preventing translocation of IRF3 to the nucleus (Frieman et al., 2009; 
Wang et al., 2011; Zheng et al., 2008). In agreement with this proposed function, an 
arterivirus PL2 pro was shown to play an important role in immune evasion (van Kasteren 
et al., 2013). 

Nsp3 (NAB to CoV-Y) 

The region between the PL2 pro domain and the transmembrane region of nsp3 does not 
show obvious sequence similarity to any known domain that would give a clue to its 
function. However, a NMR structural study revealed that the nucleic acid binding (NAB) 
domain is an independently folded unit capable of binding RNA with relatively high 
affinity, and with duplex-unwinding activity reminiscent of an RNA chaperone (Neuman 
et al., 2008). The NAB, along with several other domains of nsp3 has been 
demonstrated to form homodimers upon incubation at 37 Q C (Neuman et al., 2008) by 
perfluorooctanoic acid polyacrylamide gel electrophoresis. Little else is known about the 
function of NAB in the viral replication cycle, or about the structure and function of the 
betacoronavirus-specific marker (PSM) domain that follows or the conserved 
hydrophobic, non-transmembrane region that immediately precedes the first 
transmembrane region of nsp3. 

The region from the first transmembrane helix to the carboxyl terminus of nsp3 was 
originally annotated as the Y-domain (Gorbalenya et al., 1991), in the sense that it 
followed the X-domain that we now refer to as Macl. We have subdivided this region 
into two transmembrane regions, an ectodomain (3Ecto), a widely-conserved initial 
domain (Y1), and an apparently coronavirus-specific carboxyl-terminal domain (CoV-Y). 
No protein structures are available at this time for any part of the Y-domain, and the 
domain assignment in this region may change as new structures appear. A Fold and 
Function Annotation System search (FFAS; (Jaroszewski et al., 2005)) using the 
sequence from the SARS-CoV NAB to the end of nsp3 reveals three of seven 
significant hits (with expect values of -8 or better) to viral RNA-dependent RNA 
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polymerase proteins, which may hint at the evolutionary origin of nsp3, which comprises 
nearly one fifth of most coronavirus genomes. The level of conservation in the Y1 
domain in particular approaches levels consistent with the other enzymatic domains of 
nsp3, and exceeds the conservation of other domains that are believed to be non- 
enzymatic (Neuman et al., 2008), but the function and structure of this region remain to 
be explored (Fig. 5). The 3Ecto, Y1 and CoV-Y domains are highly conserved in all 
CoVs, but this region has not been structurally characterized yet. An Fold and Function 
Annotation System search (FFAS; (Jaroszewski et al., 2005)) using the sequence from 
the SARS-CoV NAB to the end of nsp3 reveals three of seven significant hits (with 
expect values of -8 or better) to viral RdRp proteins, which may hint at the evolutionary 
origin of nsp3, which comprises nearly one fifth of most coronavirus genomes. The 
level of conservation in the Y1 domain in particular approaches levels consistent with 
the other enzymatic domains of nsp3, and exceeds the conservation of other domains 
that are believed to be non-enzymatic (Neuman et al., 2008), but the function and 
structure of this region remain to be explored (Fig. 3). 

It appears that domains from PL2 pro to the CoV-Y domain have not undergone 
significant deletion or rearrangement during coronavirus evolution, while other nsps like 
nspl, nsp2, and the N-terminal regions of nsp3 clearly have evolved by duplication and 
deletion of domains (Neuman et al., 2014b; Neuman et al., 2008). The C-terminal 
portion of nsp3 has been shown to change the localization of nsp4 (Flagemeijer et al., 
2011), and cause a membrane proliferation phenotype in transfected cells (Angelini et 
al., 2013). The topology of nsp3 leaves only one domain, annotated here as the 
ectodomain of nsp3, or 3Ecto, on the luminal side of the membrane. If nsp3 participates 
directly in the membrane pairing exhibited in cells transfected with SARS-CoV nsp3 and 
nsp4 (Angelini et al., 2013), then the 3Ecto domain likely helps mediate membrane 
pairing directly. We previously noted that a cluster of cysteine and histidine residues in 
3Ecto may coordinate a metal ion, but analysis of newly sequenced viruses shows that 
the Cys-His cluster is not conserved in all coronaviruses, hence this domain has been 
renamed in the annotation presented here. 
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Nsp4 

Nsp4 is a transmembrane protein with four transmembrane helices and a cytosolic C- 
terminal domain (Oostra et al., 2007). Coronavirus nsp4 is approximately 500 amino 
acid residues in length, and is the only part of the viral polyprotein that is released after 
processing by both the PL pro and M pro . The location and topology of the four 
transmembrane regions has been mapped (Oostra et al., 2007), but only the C-terminal 
portion of nsp4 appears to be conserved throughout the Nidovirales (Fig. 6). 

The C-terminal domain of nsp4 is conserved in all known coronaviruses, but deletion of 
this domain including the fourth transmembrane helix from a MHV infectious clone 
resulted in only slightly attenuated growth, consistent with a non-essential function 
(Sparks et al., 2007). It is somewhat surprising that the C-terminal domain of nsp4 is 
dispensible for replication since other nidoviruses with the exception of the 
Mesoniviridae contain a domain at this position with a similar predicted structure 
(V'Kovski et al., 2015). Mutation of the two glycosylation sites in nsp4, however, led to 
defective DMV formation and attenuation (Gadlage et al., 2010). The structure of the C- 
terminal 4Endo domain of Feline coronavirus has been reported. 4Endo consists of two 
small antiparallel beta-sheets and four alpha-helices (Manolaridis et al., 2009), and is 
highly conserved among coronaviruses at the amino acid level (Fig. 6). 

SARS-CoV Nsp4 is an essential component for the formation of viral double-membrane 
vesicles (Angelini et al., 2013). Intracellular expression studies have demonstrated a 
biological interaction between the carboxyl-terminal region of MHV nsp3 and nsp4 
(Hagemeijer et al., 2011), and co-expression of full-length SARS-CoV nsp3 and nsp4 
results in extensive membrane pairing, in which the paired membranes are held at the 
same distance as observed in authentic DMVs (Angelini et al., 2013). Nsp4 has also 
been shown to interact with nsp2 in a yeast two-hybrid screen (von Brunn et al., 2007), 
and to interact with other nsp4 molecules in cells (Hagemeijer et al., 2011). 

Mutagenesis of the 4Ecto domain of nsp4 has been shown to cause aberrant DMV 
formation upon mutation, leading to a loss of nsp4 glycosylation (Gadlage et al., 2010; 
Sparks et al., 2007). 


16 




ACCEPTED MANUSCRIPT 


Nsp5 

Coronavirus nsp5 is also known as the main protease (M pro ), a chymotrypsin-like 
protease related to the enteroviral 3C protease. It belongs to the C30 family of 
endopeptidases and is responsible for cleavage at 11 sequence specific sites within 
polyprotein 1 a/lab. The resultant “mature” protein products (nsp4 to nsp16) assemble 
into components of the replication complexes (reviewed in (Hilgenfeld et al., 2006; 
Ziebuhr et al., 2000)). M pro is one of only three parts of the coronavirus replicase, along 
with the nsp12 polymerase and nsp13 helicase regions, that is conserved throughout 
the Nidovirales (Lauber et al., 2013). 

Based on both structure and sequence characteristics, nsp5 can be divided into three 
domains: a two-domain active region (Dorn I and II) and a third domain that plays a role 
in nsp5 dimerization (Dorn III; (Yang et al., 2003)). The three-domain architecture is 
conserved in all coronaviruses, all nidoviral groups and several other RNA viruses that 
share a common polyprotein processing scheme (Ziebuhr et al., 2000). The sequence is 
related to chymotrypsin-like protease superfamily of endopeptidases. (Murzin et al., 
1995)). The three-domain architecture is conserved in all coronaviruses, all nidoviral 
groups and several other RNA viruses that share a common polyprotein processing 
scheme (Ziebuhr et al., 2000). The sequence is related to chymotrypsin-like protease 
superfamily of endopeptidases. 

The critical role of the first seven residues at the N terminus in dimerization and its close 
proximity to the active site results in this enzyme to be an obligate dimer (Anand et al., 
2002), although modification of the termini appears to modulate higher order 
oligomerization (Zhang et al., 2010). Deletion of the first five amino acid residues results 
in complete inactivation of this enzyme. The helical C-terminal domain III mediates 
homodimerization of coronaviral M pro proteases. This interaction is believed to be 
important for its trans-proteolytic activity. The active site is located at the interface of the 
two beta-barrels with the catalytic residues H51 and Cl 45 being contributed by domains 
1 and 2 respectively in the SARS-CoV M pro . 
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As part of its proteolytic activity, M pro is destined to interact with all the non-structural 
proteins from nsp4 to nsp16, presumably near its catalytic site. Interestingly, 
comparison of all the known sites where mutation can lead to a temperature-sensitive, 
RNA-negative phenotype suggests a close connection to M pro (Fig. 7). To date, the only 
proteins where temperature-sensitive mutations have been discovered are nsp3, nsp5, 
nspIO, nsp12 ,nsp14 and nsp16 (Al-Mulla et al., 2014; Sawicki et al., 2005; Stokes et 
al., 2010), all of which make fewer or smaller DMVs compared to wild-type virus (Al- 
Mulla et al., 2014). NspIO interacts with nsp5 directly (Imbert et al., 2008), and both 
nspIO and nsp3 mutations paradoxically inhibit M pro activity (Donaldson et al., 2007; 
Stokes et al., 2010). One interactome study showed that nsp12 and nsp14 also directly 
interact with nsp5 (Pan et al., 2008), and nsp14 and 16 would also indirectly interact 
with nsp5 as part of the nsp10-14-16 complex (Bouvet et al., 2010; Decroly et al., 2011; 
Imbert et al., 2008; Pan et al., 2008). Together this suggests that nsp5 plays a critical 
role in both RNA replication and DMV formation, likely by proteolytically releasing nsp4 
and nsp6. 

Nsp6 

Nsp6 has six transmembrane regions, with both termini located on the cytosolic side of 
the membrane (Oostra et al., 2008). Although most coronavirus nsp6 proteins are 
predicted by TMHMM2.0 (Krogh et al., 2001) to contain seven transmembrane regions, 
only six of these function as membrane-spanning helices. The presence of additional 
non-transmembrane hydrophobic domains near authentic transmembrane domains is a 
common theme running through the DMO making proteins. IBV and SARS-CoV nsp6 
have been shown to activate autophagy, inducing vesicles containing Atg5 and LC3-II 
(Cottam et al., 2011). MHV Nsp6 colocalizes with nsp4 when co-expressed 
(Hagemeijer et al., 2012), suggesting that the two proteins may interact. SARS-CoV 
nsp6 has also been shown to interact with nsp2, nsp8, nsp9 and accessory protein 9b 
via yeast two-hybrid assays (von Brunn et al., 2007). It is notable that both the 4Endo 
and 6Endo domains are nearly as well conserved in coronaviruses as the catalytic 
domain of M pro (Fig. 5). 
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Overexpression of nsp6 disturbs intracellular membrane trafficking (Cottam et al., 
2011), resulting in an accumulation of single-membrane vesicles around the microtubule 
organization complex (Angelini et al., 2013). However, coexpression with nsp4 prevents 
vesicle accumulation due to nsp6 expression, suggesting that nsp4 and 6 interact. 
While nsp6 was not necessary for membrane pairing, it was essential for forming the 
DMVs that are characteristic of coronavirus replicative organelles. 

After serial passage of Human coronavirus 229E virus in the presence of the 
experimental antiviral K22, resistant viruses carrying mutations in nsp6 were selected 
(Lundin et al., 2014). K22 showed antiviral effects in the high nanomolar to low 
micromolar range, requiring higher amounts to achieve an antiviral effect than would 
normally be considered practical for wider clinical use. Time of addition and removal 
studies showed that K22 was most effective early in infection, after entry, consistent 
with effects on establishment of viral replication complexes or direct interference with 
the process of RNA replication (Lundin et al., 2014). Surprisingly, two independently 
isolated resistance mutations mapped to opposite ends of transmembrane helices in 
nsp6 at positions HI 21L and Ml 59V. The resistant viruses released similar amounts of 
new progeny compared to wt, but produced only about half as many DMVs per infected 
cell, confirming the importance of nsp6 for authentic DMV formation. In addition, the 
DMVs induced by resistance mutants appeared structurally impaired. Similarly to MHV 
nsp4 mutants (Beachboard et al., 2015; Gadlage et al., 2010), K22 escape mutants 
induced DMV with partially collapsed inner membranes, even when K22 was not 
present. Moreover, the specific infectivity of those newly released virions was about ten¬ 
fold lower for nsp6 mutants than for wt. This suggested that the mutations in nsp6 
conferred resistance to K22 at a cost of impairing an early intracellular step in the 
establishment of infection. 

DMOs and viral replication fitness 

While the structure of coronavirus DMOs has become increasingly clear, their purpose 
remains somewhat mysterious. Mutations in nsp4 of MHV resulted in decreased 
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replication and competitive fitness (Beachboard et al., 2015). MHV mutants that 
produced fewer DMVs, but equal or greater amounts of genomic and subgenomic viral 
RNA have been described (Al-Mulla et al., 2014), suggesting that coronavirus 
replication is not strictly dependent on high numbers of DMVs or the size of the DMV 
interior. Notably, when equal infectivities of two viruses were added to the same flask at 
a temperature where both viruses could grow normally, several mutants with small DMV 
and low-DMV phenotypes did not appear to be at a competitive disadvantage compared 
to wild-type virus (Al-Mulla et al., 2014). This result was replicated in several 
continuous cell lines and primary cells. Furthermore, a survey of Infectious bronchitis 
virus revealed that one high-pathogenicity strain produced abnormally low numbers of 
double-membrane spherules, while still producing an equivalent amount of RNA to a 
vaccine strain (Maier et al., 2016). These results suggest that whatever their purpose, 
coronavirus replicative organelles display a surprising degree of structural plasticity 
without necessarily impairing RNA production, pathogenicity or competitive fitness. 
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Figure Legends 

Figure 1. Intracellular changes following coronavirus infection. Electron 
micrograph of an ultrathin section of a murine DBT cell infected with Mouse hepatitis 
virus. Double-membrane vesicles (red) and convoluted membranes (purple) associated 
with viral RNA synthesis can be seen at left and new virions (blue) can be seen 
associated with Golgi membranes at right. 

Figure 2. Organization and conservation of genes involved in DMO formation 
throughout the Nidovirales. (A) A schematic of the SARS coronavirus polyprotein 
showing the organization of nonstructural proteins (nsp) and the location of components 
important for RNA synthesis and replicative organelle formation. Sites where the 
polyprotein is cleaved by the main protease (M pro ) are marked with black triangles, and 
cleavage sites of nsp3 proteases are marked by grey triangles. The RNA-dependent 
RNA polymerase (RdRp), superfamily 1 helicase (Hel), exonuclease (Exo), 
endonuclease (Endo), and the two viral cap methyltransferases (NMT, OMT) are 
shown. (B) Organization of the proteins most likely to be involved in formation of 
replicative organelles across the order Nidovirales. Conserved domains were identified 
by amino acid alignment, or by comparison of predicted protein secondary structure. 
Domain function was predicted by hhpred (Soding et al., 2005). Potential cleavage 
sites were annotated by analogy to known cleavage sites and proximity to conserved 
domains. Clusters with 4 histidine or cysteine residues within 25 amino acid residues 
that may bind metal ions are indicated. Conserved transmembrane regions (black) and 
regions predicted as transmembrane helices by TMHMM 2.0 (Sonnhammer et al., 1998) 
but shown experimentally not to be transmembrane are zebra striped. 

Figure 3. Interactions and intracellular changes associated with expression of 
SARS-CoV nonstructural protein 3, 4 and 6. Panel A is a schematic view of the 
topology and processing of nsp3-6 from polyprotein precursors. The amino-terminal 
(3N) and carboxyl-terminal (C) regions of nsp3 are labelled. Panels B-D show putative 
nsp-nsp and nsp-host interactions that may contribute to membrane phenotypes and 
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membrane pairing. Panel E summarizes the effects of SARS-CoV nsp3, 4 and 6 
expression on intracellular membrane appearance. 

Figure 4. Revised domain-level organization of nsp3 in the Coronavirinae. In 

panel A, domain annotations based on the ongoing work of Gorbalenya and 
collaborators (1991 and 2003), our previous annotation (2008) and our new revised 
annotation (2016) are shown for comparison. Structures that were solved at the time of 
domain assignment are shown in red, and domain assignments based on amino acid 
alignments are shown in grey. Panel B is based on a multiple sequence alignment of 
nsp3 amino acid sequences from the genera Alphacoronavirus ( aCoV ), 
Betacoronavirus ( fiCoV ), Gammacoronavirus ( yCoV) and Deltacoronavirus ( SCoV). 
Four to fifteen amino nsp3 protein sequences are shown per genus, organized into 
phylogenetic clusters. Amino acid residues in the alignment have been converted to 
colored blocks to show domain-level conservation and highlight areas with clusters of 
well-conserved hydrophobic (grey), polar (light blue), acidic (red) and cysteine/histidine 
(black) residues. Detailed alignments can be made available by writing to the 
corresponding author. Domain designations include ubiquitin-like (Ubl, Ub2), 
hypervariable region (HVR) previously described as acidic (Ac), full (PL1 pro , PL2 pro ) or 
partial (PL1 X ) papain-like protease that is missing at least one domain and one 
catalytically important amino acid, macro domains Macl (previously X or ADP-ribose 1” 
phosphatase), Mac2 (previously SARS-CoV unique domain N-terminal or metal-binding 
domain MBD), Mac3 (previously SARS-CoV unique domain middle), domain preceding 
Ub2 and PL2 pro (DPUP), nucleic acid binding (NAB), group specific markers (GSM), 
Betacoronavirus specific marker (3SM, previously group 2 specific marker G2M), 
Gammacoronavirus specific marker (ySM), large Gammacoronavirus marker (LyM), 
transmembrane region (TM1, TM2), nsp3 ectodomain (3Ecto, formerly zinc finger ZF), Y 
domain regions Y1 and coronavirus-specific Y domain (CoV-Y). In panel C, grey boxes 
show regions of apparent homology between coronavirus nsp3 and the equivalent 
proteins of Arteriviridae (Arteri), Mesoniviridae (Mesoni), Torovirinae including 
Bafinivirus (Toro), Ball python nidovirus (BPNV) and Bovine nidovirus-TCH5, and 
present in Arteriviridae but processed into separate polypeptides (Arteri*). 
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Figure 5. Variability in domain-level conservation in DMO-making proteins of the 
Coronavirinae. The thirty-three representative coronavirus protein sequences shown 
Figures 4 and 6 were aligned by Clustal Omega (Sievers et al., 2011). Percentage 
amino acid identity (%ID) was calculated between viruses in the same genus or in 
different genera. Average percent amino acid identity was calculated from 528 unique 
comparisons and plotted plus or minus standard deviation. 

Figure 6. Revised domain architecture and conservation of Coronavirinae nsp4, 
nsp5 and nsp6. (A) Previous annotations based on the ongoing work of Gorbalenya 
and collaborators (1991) and a new revised annotation (2016) are shown for 
comparison. Structures that were solved at the time of domain assignment are shown 
in red, and domain assignments based on amino acid alignments are shown in grey. (B) 
Annotation of nsp4-6 domains following the style of Fig. 4. Domain designations include 
nsp4 transmembrane regions (TM1, TM2-4), ectodomain (4Ecto) and endodomain 
(4Endo); nsp5 catalytic domains (Dorn I and II) and dimerization domain (Dorn III); and 
nsp6 transmembrane regions (TM1-4, TM5, TM6), ectodomain (6Ecto) and endodomain 
(6Endo). (C) Grey boxes show regions of apparent homology between coronavirus 
nsp4-6 and the equivalent proteins of Arteriviridae (Arteri), Mesoniviridae (Mesoni), 
Torovirinae (Toro), and Roniviridae (Roni). Regions of questionable homology based 
on predicted transmembrane regions or predicted protein secondary structure are 
marked with a question mark. 

Figure 7. Functional map of potential connections between DMO making and 
RNA synthesis based on MHV ts mutants. Mutations linked to temperature-sensitive 
phenotypes, publications describing those mutations and their position in the system of 
cis-acting “cistrons” established by the complementation studies of Sawicki and Siddell 
are shown. Functional domain abbreviations include two papain-like proteases (PL1 pro , 
PL2 pro ), a macrodomain (Macl), main protease (M pro ), RNA-dependent RNA 
polymerase (RdRp), and RNA cap-methyltransferases (NMT, OMT). Temperature- 
sensitive mutations known to inhibit polyprotein processing by nsp5 M pro or in proteins 
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that interact with M pro are shown in red, numbered from the start of the nonstructural 
protein where each is found. M pro interactions based on mammalian two-hybrid data 
(Pan et al., 2008) are shown in orange. Interactions that indirectly link to M pro via nspIO 
are shown in yellow. Proposed interactions among DMV-making proteins are shown 
with gray arrows. 
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Bioinformatics and function of the coronavirus replicative organelle¬ 
making proteins 

For the symposium: From SARS to MERS 
Benjamin W. Neuman 

Highlights 

1. Bioinformatics reveals a new domain-level map of coronavirus nsp3-nsp6 

2. Domain-level protein variability is a tool for functional annotation 

3. Ten nsp3 domains are conserved in all known corona viruses 

4. Review of the role of the nsp5 main protease in RNA synthesis 



